Leveraging Multi-Domain Prior Knowledge in Topic Models

نویسندگان

  • Zhiyuan Chen
  • Arjun Mukherjee
  • Bing Liu
  • Meichun Hsu
  • Malú Castellanos
  • Riddhiman Ghosh
چکیده

Topic models have been widely used to identify topics in text corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. In this paper, we go one step further to study how the prior knowledge from other domains can be exploited to help topic modeling in the new domain. This problem setting is important from both the application and the learning perspectives because knowledge is inherently accumulative. We human beings gain knowledge gradually and use the old knowledge to help solve new problems. To achieve this objective, existing models have some major difficulties. In this paper, we propose a novel knowledge-based model, called MDK-LDA, which is capable of using prior knowledge from multiple domains. Our evaluation results will demonstrate its effectiveness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast and Effective Framework for Lifelong Topic Model with Self-learning Knowledge

To discover semantically coherent topics from topic models, knowledge-based topic models have been proposed to incorporate prior knowledge into topic models. Moreover, some researchers propose lifelong topic models (LTM) to mine prior knowledge from topics generated from multi-domain corpus without human intervene. LTM incorporates the learned knowledge from multi-domain corpus into topic model...

متن کامل

Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models

Multi-dimensional latent text models, such as factorial LDA (f-LDA), capture multiple factors of corpora, creating structured output for researchers to better understand the contents of a corpus. We consider such models for clinical research of new recreational drugs and trends, an important application for mining current information for healthcare workers. We use a “three-dimensional” f-LDA va...

متن کامل

Expressive Knowledge Resources in Probabilistic Models

Title of dissertation: EXPRESSIVE KNOWLEDGE RESOURCES IN PROBABILISTIC MODELS Yuening Hu, Doctor of Philosophy, 2014 Dissertation directed by: Professor Jordan Boyd-Graber iSchool, Umiacs Understanding large collections of unstructured documents remains a persistent problem. Users need to understand the themes of a corpus and to explore documents of interest. Topic models are a useful and ubiqu...

متن کامل

Adding Semantic Information into Data Models by Learning Domain Expertise from User Interaction

Interactive visual analytic systems enable users to discover insights from complex data. Users can express and test hypotheses via user interaction, leveraging their domain expertise and prior knowledge to guide and steer the analytic models in the system. For example, semantic interaction techniques enable systems to learn from the user’s interactions and steer the underlying analytic models b...

متن کامل

Language Based Mapping of Science Assessment Items to Skills

Knowledge of the association between assessment questions and the skills required to solve them is necessary for analysis of student learning. This association, often represented as a Q-matrix, is either handlabeled by domain experts or learned as latent variables given a large student response data set. As a means of automating the match to formal standards, this paper uses neural text classif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013